AITopics | adaptive temporal difference learning

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Neural Information Processing SystemsDec-24-2025, 13:47:07 GMT

Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU network approximation whose samples are generated from a Markov decision process. Our established theory shows that if the width of the deep neural network is large enough, the adaptive TD using neural network approximation can find the (optimal) value function with high probabilities under the same iteration complexity as TD in general cases. Furthermore, we show that the adaptive TD using neural network approximation, with the same width and searching area, can achieve theoretical acceleration when the stochastic semi-gradients decay fast.

adaptive td, adaptive temporal difference learning, approximation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Neural Information Processing SystemsJan-14-2025, 22:23:01 GMT

Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU network approximation whose samples are generated from a Markov decision process.

adaptive td, adaptive temporal difference learning, approximation, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Adaptive Temporal Difference Learning with Linear Function Approximation

Sun, Tao, Shen, Han, Chen, Tianyi, Li, Dongsheng

arXiv.org Machine LearningFeb-19-2020

This paper revisits the celebrated temporal difference (TD) learning algorithm for the policy evaluation in reinforcement learning. Typically, the performance of the plain-vanilla TD algorithm is sensitive to the choice of stepsizes. Oftentimes, TD suffers from slow convergence. Motivated by the tight connection between the TD learning algorithm and the stochastic gradient methods, we develop the first adaptive variant of the TD learning algorithm with linear function approximation that we term AdaTD. In contrast to the original TD, AdaTD is robust or less sensitive to the choice of stepsizes. Analytically, we establish that to reach an $\epsilon$ accuracy, the number of iterations needed is $\tilde{O}(\epsilon^2\ln^4\frac{1}{\epsilon}/\ln^4\frac{1}{\rho})$, where $\rho$ represents the speed of the underlying Markov chain converges to the stationary distribution. This implies that the iteration complexity of AdaTD is no worse than that of TD in the worst case. Going beyond TD, we further develop an adaptive variant of TD($\lambda$), which is referred to as AdaTD($\lambda$). We evaluate the empirical performance of AdaTD and AdaTD($\lambda$) on several standard reinforcement learning tasks in OpenAI Gym on both linear and nonlinear function approximation, which demonstrate the effectiveness of our new approaches over existing ones.

adatd, algorithm, approximation, (14 more...)

arXiv.org Machine Learning

2002.08537

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)
(5 more...)

Genre: Research Report (0.64)

Add feedback

Filters

Collaborating Authors

adaptive temporal difference learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks

Adaptive Temporal Difference Learning with Linear Function Approximation